Load the object

## see ?STexampleData and browseVignettes('STexampleData') for documentation
## loading from cache

Check the object structure

## class: SpatialExperiment 
## dim: 33538 4992 
## metadata(0):
## assays(1): counts
## rownames(33538): ENSG00000243485 ENSG00000237613 ... ENSG00000277475
##   ENSG00000268674
## rowData names(3): gene_id gene_name feature_type
## colnames(4992): AAACAACGAATAGTTC-1 AAACAAGTATCTCCCA-1 ...
##   TTGTTTGTATTACACG-1 TTGTTTGTGTAAATTC-1
## colData names(7): barcode_id sample_id ... ground_truth cell_count
## reducedDimNames(0):
## mainExpName: NULL
## altExpNames(0):
## spatialCoords names(2) : pxl_col_in_fullres pxl_row_in_fullres
## imgData names(4): sample_id image_id data scaleFactor

Check number of features/genes (rows) and spots (columns)

## [1] 33538  4992

Check names of ‘assay’ tables

## [1] "counts"

Counts table and gene metadata

#counts table (object of class dgTMatrix which is a sparse matrix)

## 6 x 4 sparse Matrix of class "dgTMatrix"
##                 AAACAACGAATAGTTC-1 AAACAAGTATCTCCCA-1 AAACAATCTACTAGCA-1
## ENSG00000243485                  .                  .                  .
## ENSG00000237613                  .                  .                  .
## ENSG00000186092                  .                  .                  .
## ENSG00000238009                  .                  .                  .
## ENSG00000239945                  .                  .                  .
## ENSG00000239906                  .                  .                  .
##                 AAACACCAATAACTGC-1
## ENSG00000243485                  .
## ENSG00000237613                  .
## ENSG00000186092                  .
## ENSG00000238009                  .
## ENSG00000239945                  .
## ENSG00000239906                  .

#genes with some level of expression

## 21 x 11 sparse Matrix of class "dgTMatrix"
##   [[ suppressing 11 column names 'CGCGCAAGGAACTACA-1', 'CGCGCATGTTTGATTG-1', 'CGCGCCCGACTTAATA-1' ... ]]
##                                      
## ENSG00000223764 . . . . . . . . . . .
## ENSG00000187634 . . . . . . . . . . .
## ENSG00000188976 . . 2 . . . . . . 1 1
## ENSG00000187961 . . . . . . . . . . .
## ENSG00000187583 . . . . . . . . . . .
## ENSG00000187642 . . . . . . . . . . .
## ENSG00000272512 . . . . . . . . . . .
## ENSG00000188290 1 . . . . . . . . 2 .
## ENSG00000187608 . 1 . . . . 2 . . 1 .
## ENSG00000224969 . . . . . . . . . . .
## ENSG00000188157 . 1 . . 2 . . . . 1 .
## ENSG00000273443 . . . . . . . . . . .
## ENSG00000237330 . . . . . . . . . . .
## ENSG00000131591 . . . . . . . . . 1 .
## ENSG00000223823 . . . . . . . . . . .
## ENSG00000272141 . . . . . . . . . . .
## ENSG00000205231 . . . . . . . . . . .
## ENSG00000162571 . . . . . . . . . . .
## ENSG00000186891 . . . 1 . . . . . . .
## ENSG00000186827 . . . . . . . . . . .
## ENSG00000078808 . 1 2 . 1 . . . . 1 .
## 21 x 11 sparse Matrix of class "dgTMatrix"
##   [[ suppressing 11 column names 'CGCGCAAGGAACTACA-1', 'CGCGCATGTTTGATTG-1', 'CGCGCCCGACTTAATA-1' ... ]]
##                                                     
## ENSG00000160294  .  .   .  .   .   .  .  .  .   .  .
## ENSG00000228137  .  .   .  .   .   .  .  .  .   .  .
## ENSG00000239415  .  .   .  .   .   .  .  .  .   .  .
## ENSG00000182362  .  .   .  .   .   .  .  .  1   .  .
## ENSG00000160298  .  .   .  .   .   .  .  .  .   .  .
## ENSG00000160299  .  .   1  .   1   .  .  .  .   .  .
## ENSG00000160305  .  .   .  .   .   2  .  .  .   .  .
## ENSG00000160307  1  3   1  1   4   5  1  1  .   2  1
## ENSG00000160310  .  .   .  .   1   .  .  .  .   2  .
## ENSG00000198888 17 44  71 16 154  97 12 14 32 167  6
## ENSG00000198763 16 59  64 11 116  63 11 12 18 123  6
## ENSG00000198804 37 85 155 25 252 176 24 27 38 335 12
## ENSG00000198712 23 79 120 23 214 170 22 25 48 242 10
## ENSG00000228253  2  .   3  .   1   .  .  1  1   6  .
## ENSG00000198899 20 39  93  9 136 108 20 18 25 165  7
## ENSG00000198938 27 59 133 20 216 120 22 26 43 232  9
## ENSG00000198840  5 27  33  5  71  39  8 11 12  78  .
## ENSG00000212907  2  .   4  2   7   5  .  1  1   9  .
## ENSG00000198886 15 65  95  9 183  98 18 19 33 178  7
## ENSG00000198786  2 10  10  3  20  14  1  2  2  25  4
## ENSG00000198695  1  1   3  .   2   2  .  .  .   1  .

genes metadata

## DataFrame with 6 rows and 3 columns
##                         gene_id   gene_name    feature_type
##                     <character> <character>     <character>
## ENSG00000243485 ENSG00000243485 MIR1302-2HG Gene Expression
## ENSG00000237613 ENSG00000237613     FAM138A Gene Expression
## ENSG00000186092 ENSG00000186092       OR4F5 Gene Expression
## ENSG00000238009 ENSG00000238009  AL627309.1 Gene Expression
## ENSG00000239945 ENSG00000239945  AL627309.3 Gene Expression
## ENSG00000239906 ENSG00000239906  AL627309.2 Gene Expression

Coordinates table and spot metadata

Check the spatial coordinates

##                    pxl_col_in_fullres pxl_row_in_fullres
## AAACAACGAATAGTTC-1               3913               2435
## AAACAAGTATCTCCCA-1               9791               8468
## AAACAATCTACTAGCA-1               5769               2807
## AAACACCAATAACTGC-1               4068               9505
## AAACAGAGCGACTCCT-1               9271               4151
## AAACAGCTTTCAGAAG-1               3393               7583

spot-level metadata

## DataFrame with 6 rows and 7 columns
##                            barcode_id     sample_id in_tissue array_row
##                           <character>   <character> <integer> <integer>
## AAACAACGAATAGTTC-1 AAACAACGAATAGTTC-1 sample_151673         0         0
## AAACAAGTATCTCCCA-1 AAACAAGTATCTCCCA-1 sample_151673         1        50
## AAACAATCTACTAGCA-1 AAACAATCTACTAGCA-1 sample_151673         1         3
## AAACACCAATAACTGC-1 AAACACCAATAACTGC-1 sample_151673         1        59
## AAACAGAGCGACTCCT-1 AAACAGAGCGACTCCT-1 sample_151673         1        14
## AAACAGCTTTCAGAAG-1 AAACAGCTTTCAGAAG-1 sample_151673         1        43
##                    array_col ground_truth cell_count
##                    <integer>  <character>  <integer>
## AAACAACGAATAGTTC-1        16           NA         NA
## AAACAAGTATCTCCCA-1       102       Layer3          6
## AAACAATCTACTAGCA-1        43       Layer1         16
## AAACACCAATAACTGC-1        19           WM          5
## AAACAGAGCGACTCCT-1        94       Layer3          2
## AAACAGCTTTCAGAAG-1         9       Layer5          4

Have a look at the image metadata

## DataFrame with 2 rows and 4 columns
##       sample_id    image_id   data scaleFactor
##     <character> <character> <list>   <numeric>
## 1 sample_151673      lowres   ####   0.0450045
## 2 sample_151673       hires   ####   0.1500150

retrieve the image

#The position of a point in an image does not map directly to the spot location in cartesian coordinates, as it is the top-left of an image that is (0,0), not the bottom-left. In order to manage this, we need to transform the y-axis coordinates.

## [1] 600 600

flip the Y axis

Add the annotation to the coordinate data frame

#To identify spot is “on tissue” or not can be used to colour the spots

Calculating QC metrics

## [1] 33538  4992

Subset to keep only on-tissue spots

## [1] 33538  3639

#quality “trim” the dataset is to calculate the percentage per spot of mitochodrial gene expression and store this information inside the colData #identify the mitochrondrial genes - their gene names start with “MT-” or “mt-”

##  [1] "MT-ND1"  "MT-ND2"  "MT-CO1"  "MT-CO2"  "MT-ATP8" "MT-ATP6" "MT-CO3" 
##  [8] "MT-ND3"  "MT-ND4L" "MT-ND4"  "MT-ND5"  "MT-ND6"  "MT-CYB"

#proportion of reads in a spot’s library

## DataFrame with 6 rows and 13 columns
##                            barcode_id     sample_id in_tissue array_row
##                           <character>   <character> <integer> <integer>
## AAACAAGTATCTCCCA-1 AAACAAGTATCTCCCA-1 sample_151673         1        50
## AAACAATCTACTAGCA-1 AAACAATCTACTAGCA-1 sample_151673         1         3
## AAACACCAATAACTGC-1 AAACACCAATAACTGC-1 sample_151673         1        59
## AAACAGAGCGACTCCT-1 AAACAGAGCGACTCCT-1 sample_151673         1        14
## AAACAGCTTTCAGAAG-1 AAACAGCTTTCAGAAG-1 sample_151673         1        43
## AAACAGGGTCTATATT-1 AAACAGGGTCTATATT-1 sample_151673         1        47
##                    array_col ground_truth cell_count       sum  detected
##                    <integer>  <character>  <integer> <numeric> <numeric>
## AAACAAGTATCTCCCA-1       102       Layer3          6      8458      3586
## AAACAATCTACTAGCA-1        43       Layer1         16      1667      1150
## AAACACCAATAACTGC-1        19           WM          5      3769      1960
## AAACAGAGCGACTCCT-1        94       Layer3          2      5433      2424
## AAACAGCTTTCAGAAG-1         9       Layer5          4      4278      2264
## AAACAGGGTCTATATT-1        13       Layer6          6      4004      2178
##                    subsets_mito_sum subsets_mito_detected subsets_mito_percent
##                           <numeric>             <numeric>            <numeric>
## AAACAAGTATCTCCCA-1             1407                    13              16.6351
## AAACAATCTACTAGCA-1              204                    11              12.2376
## AAACACCAATAACTGC-1              430                    13              11.4089
## AAACAGAGCGACTCCT-1             1316                    13              24.2223
## AAACAGCTTTCAGAAG-1              651                    12              15.2174
## AAACAGGGTCTATATT-1              621                    13              15.5095
##                        total
##                    <numeric>
## AAACAAGTATCTCCCA-1      8458
## AAACAATCTACTAGCA-1      1667
## AAACACCAATAACTGC-1      3769
## AAACAGAGCGACTCCT-1      5433
## AAACAGCTTTCAGAAG-1      4278
## AAACAGGGTCTATATT-1      4004

Library size threshold plot

#current plot the library sizes looks good and evenly distributed.

## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

#The horizontal red line (argument threshold in the plotQC function) shows a first guess at a possible filtering threshold for library size based on the above histogram.

## `geom_smooth()` using formula = 'y ~ x'
## `stat_xsidebin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_ysidebin()` using `bins = 30`. Pick better value with `binwidth`.

#alternative plot using ggplot

## `geom_smooth()` using method = 'gam' and formula = 'y ~ s(x, bs = "cs")'
## `geom_smooth()` using method = 'gam' and formula = 'y ~ s(x, bs = "cs")'

# It is important to look at the number of spots that are left out of the dataset by this choice of cut-off value #look at their putative spatial patterns #filtered out spots with biological relevance, then we should observe some patterns on the tissue map that correlate with some of the known biological structures of the tissue. If we do observe such a phenomenon, we have probably set our threshold too high (i.e. not permissive enough).

## qc_lib_size
## FALSE  TRUE 
##  3628    11

Number of expressed genes

#plot a histogram of the number of expressed genes across spots #summary: A gene is “expressed” in a spot if it has at least one count in it

## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

# plot number of expressed genes vs. number of cells per spot

## `geom_smooth()` using method = 'gam' and formula = 'y ~ s(x, bs = "cs")'
## `geom_smooth()` using method = 'gam' and formula = 'y ~ s(x, bs = "cs")'

#we apply the chosen threshold to flag spots with (in this case) fewer than 500 expressed genes. ### Select expressed genes threshold

## qc_detected
## FALSE  TRUE 
##  3628    11

Percentage of mitochondrial expression

Density and histogram of percentage of mitochondrial expression

## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

# plot mitochondrial read proportion vs. number of cells per spot

## `geom_smooth()` using method = 'gam' and formula = 'y ~ s(x, bs = "cs")'
## `geom_smooth()` using method = 'gam' and formula = 'y ~ s(x, bs = "cs")'

R Markdown

This is an R Markdown document. Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. For more details on using R Markdown see http://rmarkdown.rstudio.com.

When you click the Knit button a document will be generated that includes both content as well as the output of any embedded R code chunks within the document. You can embed an R code chunk like this:

summary(cars)
##      speed           dist       
##  Min.   : 4.0   Min.   :  2.00  
##  1st Qu.:12.0   1st Qu.: 26.00  
##  Median :15.0   Median : 36.00  
##  Mean   :15.4   Mean   : 42.98  
##  3rd Qu.:19.0   3rd Qu.: 56.00  
##  Max.   :25.0   Max.   :120.00

Including Plots

You can also embed plots, for example:

Note that the echo = FALSE parameter was added to the code chunk to prevent printing of the R code that generated the plot.